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Abstract 

We present a streaming model for large-scale clas- 
sification (in the context of <VSVM) by leveraging 
connections between learning and computational 
geometry. The streaming model imposes the con- 
straint that only a single pass over the data is al- 
lowed. The <?2-S VM is known to have an equivalent 
formulation in terms of the minimum enclosing ball 
(MEB) problem, and an efficient algorithm based 
on the idea of core sets exists (CVM) [Tsang et ah, 
2005]. CVM learns a (1 +e)-approximate MEB for 
a set of points and yields an approximate solution 
to corresponding SVM instance. However CVM 
works in batch mode requiring multiple passes over 
the data. This paper presents a single-pass SVM 
which is based on the minimum enclosing ball of 
streaming data. We show that the MEB updates for 
the streaming case can be easily adapted to learn the 
SVM weight vector in a way similar to using on- 
line stochastic gradient updates. Our algorithm per- 
forms polylogarithmic computation at each exam- 
ple, and requires very small and constant storage. 
Experimental results show that, even in such re- 
strictive settings, we can learn efficiently in just one 
pass and get accuracies comparable to other state- 
of-the-art SVM solvers (batch and online). We also 
give an analysis of the algorithm, and discuss some 
open issues and possible extensions. 

1 Introduction 

Learning in a streaming model poses the restriction that we 
are constrained both in terms of time, as well as storage. 
Such scenarios are quite common, for example, in cases such 
as analyzing network traffic data, when the data arrives in a 
streamed fashion at a very high rate. Streaming model also 
applies to cases such as disk-resident large datasets which 
cannot be stored in memory. Unfortunately, standard learning 
algorithms do not scale well for such cases. To address such 
scenarios, we propose applying the stream model of computa- 
tion [Muthukrishnan, 2005] to supervised learning problems. 
In the stream model, we are allowed only one pass (or a small 
number of passes) over an ordered data set, and polylogarith- 
mic storage and polylogarithmic computation per element. 



In spite of the severe limitations imposed by the streaming 
framework, streaming algorithms have been successfully em- 
ployed in many different domains [Guha et ai, 2003]. Many 
of the problems in geometry can be adapted to the stream- 
ing setting and since many learning problems have equivalent 
geometric formulations, streaming algorithms naturally mo- 
tivate the development of efficient techniques for solving (or 
approximating) large-scale batch learning problems. 

In this paper, we study the application of the stream model 
to the problem of maximum-margin classification, in the 
context of £ 2 -SVMs [Vapnik, 1998; Cristianini and Shawe- 
Taylor, 2000]. Since the support vector machine is a widely 
used classification framework, we believe success here will 
encourage further research into other frameworks. SVMs are 
known to have a natural formulation in terms of the minimum 
enclosing ball problem in a high dimensional space [Tsang et 
ai, 2005; 2007]. This latter problem has been extensively 
studied in the computational geometry literature and admits 
natural streaming algorithms [Zarrabi-Zadeh and Chan, 2006; 
Agarwal et ai, 2004]. We adapt these algorithms to the clas- 
sification setting, provide some extensions, and outline some 
open issues. Our experiments show that we can learn effi- 
ciently in just one pass and get competetive classification ac- 
curacies on synthetic and real datasets. 

2 Scaling up SVM Training 

Support Vector Machines (SVM) are maximum-margin 
kernel-based linear classifiers [Cristianini and Shawe-Taylor, 
2000] that are known to provide provably good generaliza- 
tion bounds [Vapnik, 1998]. Traditional SVM training is for- 
mulated in terms of a quadratic program (QP) which is typ- 
ically optimized by a numerical solver. For a training size 
of N points, the typical time complexity is 0(N 3 ) and stor- 
age required is 0(N 2 ) and such requirements make SVMs 
prohibitively expensive for large scale applications. Typical 
approaches to large scale SVMs, such as chunking [Vapnik, 
1998], decomposition methods [Chang and Lin, 2001] and 
SMO [Piatt, 1999] work by dividing the original problem into 
smaller subtasks or by scaling down the training data in some 
manner [Yu et al, 2003; Lee and Mangasarian, 2001]. How- 
ever, these approaches are typically heuristic in nature: they 
may converge very slowly and do not provide rigorous guar- 
antees on training complexity [Tsang et al., 2005]. There has 
been a recent surge in interest in the online learning literature 



for SVMs due to the success of various gradient descent ap- 
proaches such as stochastic gradient based methods [Zhang, 
2004] and stochastic sub-gradient based approaches [Shalev- 
Shwartz et al, 2007]. These methods solve the SVM opti- 
mization problem iteratively in steps, are quite efficient, and 
have very small computational requirements. Another recent 
online algorithm LASVM [Bordes et al, 2005] combines on- 
line learning with active sampling and yields considerably 
good performance doing single pass (or more passes) over 
the data. However, although fast and easy to train, for most 
of the stochastic gradient based approaches, doing a single 
pass over the data does not suffice and they usually require 
running for several iterations before converging to a reason- 
able solution. 

3 Two-Class Soft Margin SVM as the MEB 
Problem 

A minimum enclosing ball (MEB) instance is defined by a set 
of points xi, ...,xjv € R D and a metric d : R D xR D -> M^°. 
The goal is to find a point (the center) c <E M. D that minimizes 
the radius R = max„ d(x n , c). 

The 2-class ^-SVM [Tsang et al, 2005] is defined by a 
hypothesis /(x) = w T ip(x), and a training set consisting 
of N points {z„ = (x„,y„)}^ =1 with y n E {-1,1} and 
x„ e M. D . The primal of the two-classs £2 -SVM (we consider 
the unbiased case one — the extension is straightforward) can 
be written as 
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s.t. w(wV(xi)) > 1 - 6, i = 1, -,N (2) 
The only difference between the £2 -SVM and the standard 
SVM is that the penalty term has the form (C J2 n £„ 2 ) rather 
than (C 

We assume a kernel K with associated nonlinear feature 
map <p. We further assume that K has the property K (x, x) = 
n, where k is a fixed constant [Tsang et al, 2005]. Most stan- 
dard kernels such as the isotropic, dot product (normalized 
inputs), and normalized kernels satisfy this criterion. 

Suppose we replace the mapping c/?(x„) on x„ by another 
nonlinear mapping (p(z n ) on z„ such that (for unbiased case) 



<p(z n ) = y n <p(x n );C 1/2 ( 
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The mapping is done in a way that that the label information 
y n is subsumed in the new feature map (p (essentially, con- 
verting a supervised learning problem into an unsupervised 
one). The first term in the mapping corresponds to the feature 
term and the second term accounts for a regularization effect, 
where C is the misclassification cost. e„ is a vector of dimen- 
sion N, having all entries as zero, except the n th entry which 
is equal to one. 

It was shown in [Tsang et al, 2005] that the MEB instance 
(y;(zi), </?(z 2 ), . . . (ys(zjv)), with the metric defined by the in- 
duced inner product, is dual to the corresponding i?2-SVM 
instance (1). The weight vector w of the maximum margin 
hypothesis can then be obtained from the center c of the MEB 
using the constraints induced by the Lagrangian [Tsang et al, 
2007]. 



4 Approximate and Streaming MEBs 

The minimum enclosing ball problem has been extensively 
studied in the computational geometry literature. An in- 
stance of MEB, with a metric defined by an inner product, 
can be solved using quadratic programming [Boyd and Van- 
denberghe, 2004]. However, this becomes prohibitively ex- 
pensive as the dimensionality and cardinality of the data in- 
creases; for an Appoint SVM instance in D dimensions, the 
resulting MEB instance consists of N points in N + D di- 
mensions. 

Thus, attention has turned to efficient approximate solu- 
tions for the MEB. A ^-approximate solution to the MEB 
(S > 1) is a point c such that max„ d(x n ,c) < SR*, where 
R* is the radius of the true MEB solution. For example, 
A (1 + e)-approximation for the MEB can be obtained by 
extracting a very small subset (of size 0(1/ e)) of the input 
called a core-set [Agarwal et al, 2005], and running an ex- 
act MEB algorithm on this set [Badoiu and Clarkson, 2002]. 
This is the method originally employed in the CVM [Tsang 
et al, 2005]. [Har-Peled et al, 2007] take a more direct ap- 
proach, constructing an explicit core set for the (approximate) 
maximum-margin hyperplane, without relying on the MEB 
formulation. Both these algorithms take linear training time 
and require very small storage. Note that a ^-approximation 
for the MEB directly yields a ^-approximation for the regu- 
larized cost function associated with the SVM problem. 

Unfortunately, the core-set approach cannot be adapted to 
a streaming setting, since it requires 0(1/ e) passes over the 
training data. Two one-pass streaming algorithms for the 
MEB problem are known. The first [Agarwal et al, 2004] 
finds a (1 + e) approximation using 0((l/e) L £> / 2 J ) storage 
and 0((l/e)\- D / 2 i N) time. Unfortunately, the exponential 
dependence on D makes this algorithm impractical. At the 
other end of the space-approximation tradeoff, the second al- 
gorithm [Zarrabi-Zadeh and Chan, 2006] stores only the cen- 
ter and the radius of the current ball, requiring O(D) space. 
This algorithm yields a 3/2-approximation to the optimal en- 
closing ball radius. 

4.1 The Streams VM Algorithm 

We adapt the algorithm of [Zarrabi-Zadeh and Chan, 2006] 
for computing an approximate maximum margin classifier. 
The algorithm initializes with a single point (and therefore an 
MEB of radius zero). When a new point is read in off the 
stream, the algorithm checks whether or not the current MEB 
can enclose this point. If so, the point is discarded. If not, the 
point is used to suitably update the center and radius of the 
current MEB. All such selected points define a core set of the 
original point set. 

Let pi be the input point causing an update to the MEB and 
Bi be the resulting ball after the update. From figure 1, it is 
easy to verify that the new center Cj lies on the line joining 
the old center Cj_i and the new point p^. The radius and 
the center Cj of the resulting MEB can be defined by simple 
update equations. 

ri=n-i+5i (4) 
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Here 2Si = (\\pi — Cj_i|| — rj_i) is the closest distance of 
the new point pi from the old ball B;_i. Using these, we can 
define a closed-form analytical update equation for the new 
ballB,: 



Algorithm 1 StreamSVM 
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1: Input: examples (x I j,y ra )„ el .. jv, slack parameter C 
2: Output: weights (w), radius (R), number of support vec- 
tors (M) 

Initialize: M = l;i? = 0;£ 2 = l,w = j/ixi 
for n = 2 to N do 

Compute distance to center: 

d- 
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l/C 
,x„ - w) 



a/||w - y„x„| 
if d > i? then 

w = w + |(l-fl/d) (i/„ 
R = R+ l(d-R) 

e = e [i-±(i-R/d)] 2 + [t(i-R/d)]- 

M = M + 1 
end if 
end for 



Figure 1 : Ball updates 

It can be shown that, for adversarially constructed data, the 
radius of the MEB computed by the algorithm has a lower- 
bound of (1 + \/2)/2 and a worst-case upper-bound of 3/2 
[Zarrabi-Zadeh and Chan, 2006]. 

We adapt these updates in a natural way in the augmented 
feature space (p (see Algorithm 1). Each selected point be- 
longs to the core set for the MEB. The support vectors of the 
corresponding SVM instance come from this set. It is easy 
to verify that the update equations for weight vector (w) and 
the margin (R) in StreamSVM correspond to the center and 
radius updates for the ball in equation 7 and 4 respectively. 
The £ 2 term is the distance calculation is included to account 
for the fact that the distance computations are being done in 
the D + N dimensional augmented feature space (p which, 
for the linear kernel case, is given by: 



<£(zn) = 2/«x„;C 1/2 e„ T 



(7) 



Also note that, because we perform only a single pass over the 
data and the e„ components are all mutually orthogonal, we 
never need to explicitly store them. The number of updates 
to the weight vector is limited by the number of core vectors 
of the MEB, which we have experimentally found to be much 
smaller as compared to other algorithms (such as Perceptron). 
The space complexity of StreamSVM is small since only the 
weight vector and the radius need be stored. 

4.2 Kernelized StreamSVM 

Although our main exposition and experiments are with 
linear kernels, it is straightforward to extend the algo- 
rithm for nonlinear kernels. In that case, algorithm 1, 
instead of storing the weight vector w, stores an TV- 
dimensional vector of Lagrange coefficients a initialized 
as [yi , . . . , 01. The distance computation is line 5 are re- 
placed by d 2 = VJ nTO a„a m A;(x„,x m ) + 

2 VnJ2 m a m fc(x n ,x m ) + £ 2 + l/C, and the weight vec- 
tor updates in line 7 can be replaced by Lagrange coeffi- 
cients updates ai :n _i = ai : „_i(l — ^ (1 — R/d)), a n = 
R/d) y n . 



Algorithm 2 StreamSVM with lookahead L 

Input: examples (x n , y n ) n ei...N> slack parameter C, looka- 
head parameter L > 1 

Output: weights (w), radius (R), upper bound on number of 
support vectors (M) 

1: Initialize: M = 1; R = 0; £ 2 = 1; S = 0; w = j/ixi 
2: for n = 2 to N do 

Compute distance to center: 

d= vlw-ynXnlP+^ + l/C 

if d > R then 

Add example n to the active set: 
S = SU {y„x„} 
if |S| = Lthen 

Update w, R, £ 2 to enclose the ball (w, R, £ 2 ) 
and all points in S 
M = M + L;S = 
end if 
end if 
end for 
if |S| > Othen 

Update w, R, £ 2 to enclose the ball (w, R, £ 2 ) and all 
points in S 
M = M+\S\ 
end if 



9 
10 
11 

12; 
13 

14: 
15: 



4.3 StreamSVM approximation bounds and 
extension to multiple balls 

It was shown in [Zarrabi-Zadeh and Chan, 2006] that any 
streaming MEB algorithm that uses only 0(D) storage ob- 
tains a lower-bound of (1 + v / 2)/2 and an upper-bound of 
3/2 on the quality of solution (i.e., the radius of final MEB). 
Clearly, this is a conservative approximation and would af- 
fect the obtained margin of the resulting SVM classifier (and 
hence the classification performance). In order to do better in 
just a single pass, one possible conjecture could be that the 
algorithm must remember more. To this end, we therefore 
extended algorithm- 1 to simultaneously store L weight vec- 
tors (or "balls"). The space complexity of this algorithm is 
L(D + 1) floats and it still makes only a single pass over the 



data. In the MEB setting, our algorithm chooses with each 
arriving datapoint (that is not already enclosed in any of the 
balls) how the current L + 1 balls (the L balls plus the new 
data point) should be merged, resulting again into a set of L 
balls. At the end, the final set of L balls are merged together 
to give the final MEB. A special variant of the L balls case 
is when all but one of the L balls are of zero radius. This 
amounts to storing a ball of non-zero radius and to keeping a 
buffer of L many data-points (we call this the lookahead algo- 
rithm - Algorithm 2). Any incoming point, if not already en- 
closed in the current ball, is stored in the buffer. We solve the 
MEB problem (using a quadratic program of size L) when- 
ever the buffer is full. Note that algorithm 1 is a special case 
of algorithm 2 with L=\, with the MEB updates available in 
a closed analytical form (rather than having to solve a QP). 

Algorithm 1 takes linear time in terms of the input size. 
Algorithm 2 which uses a lookahead of L solves a quadratic 
program of size L whenever the buffer gets full. This step 
takes 0(L 3 ) times. The number of such updates is 0(N/L) 
(in practice, it is considerably less than N/ L) and thus the 
over all complexity for the lookahead case is 0(NL 2 ). For 
small lookaheads, this is roughly O(N). 

5 Experiments 

We evaluate our algorithm on several synthetic and real 
datasets and compare it against several state-of-the-art SVM 
solvers. We use 3 crieria for evaluations: a) Single-pass 
classification accuracies compared against single-pass of on- 
line SVM solvers such as iterative sub-gradient solver Pega- 
sos [Shalev-Shwartz et al, 2007], LASVM [Bordes et al, 
2005], and Perceptron [Rosenblatt, 1988]. b) Comparison 
with CVM [Tsang et al, 2005] which is a batch SVM al- 
gorithm based on the MEB formulation, c) Effect of using 
lookahead in StreamSVM. For fairness, all the algorithms 
used a linear kernel. 

5.1 Single-Pass Classification Accuracies 

The single-pass classification accuracies of StreamSVM and 
other online SVM solvers are shown in table- 1. Details of 
the datasets used are shown in table- 1. To get a sense of how 
good the single-pass approximation of our algorithm is, we 
also report the classification accuracies of batch-mode (i.e., 
all data in memory, and multiple passes) libSVM solver with 
linear kernel on all the datasets. The results suggest that our 
single-pass algorithm StreamSVM, using a small reasonable 
lookahead, performs comparably to the batch-mode libSVM, 
and does significantly better than a single pass of other online 
SVM solvers. 

5.2 Comparison with CVM 

We compared our algorithm with CVM which, like our al- 
gorithm, is based on a MEB formulation. CVM is highly 
efficient for large datasets but it operates in batch mode, mak- 
ing one pass through the data for each core vector. We are 
interested in knowing how many passes the CVM must make 
over the data before it achieves an accuracy comparable to our 
streaming algorithm. For that purpose, we compared the ac- 
curacy of our single-pass StreamSVM against two and more 



passes of CVM to see how long does it take for CVM to beat 
StreamSVM (we note here that CVM requires at least two 
passes over the data to return a solution). We used a lin- 
ear kernel for both. Shown in Figure 2 are the results on 
MNIST 8vs9 data and it turns out that it takes several hun- 
dreds of passes of CVM to beat the single pass accuracy of 
StreamSVM. Similar results were obtained for other datasets 
but we do not report them here due to space limitations. 



CVM vs StreamSVM: MNIST Data (8 vs 9) 




Number of passes of CVM 



Figure 2: MNIST 8vs9 data: Number of passes CVM takes be- 
fore achieving comparable single-pass accuracy of StreamSVM. X 
axis represents number of passes of CVM and Y axis represents the 
classification accuracy. 



Error bars on accuracy variations w.r.t. random streaming order (for different L) 




Figure 3: Single-pass with varying lookahead on MNIST 8vs9 data: 
Performance w.r.t random ordering of streaming. X axis represents 
the lookahead parameter and Y axis represents classification accu- 
racy. Verticle bars represent the standard deviations in accuracies for 
a given lookahead. 



5.3 Effect of Lookahead 

We also investigated the effect of doing higher-order looka- 
heads on the data. For this, we varied L (the lookahead pa- 
rameter) and, for each L, tested Algorithm 2 on 100 random 
permutations of the data stream order, also recording the stan- 
dard deviation of the classification accuracies with respect to 



Data Set 


Dim 


# Examples 
Train Test 


libSVM 
(batch) 


Perceptron 


Pegasos 

k = 1 k = 20 


LASVM 


StreamSVM 
Algo-1 Algo-2 


Synthetic A 


2 


20,000 


200 




96.5 


95.5 


83.8 


89.9 


96.5 


95.5 


97.0 


Synthetic B 


3 


20,000 


200 




66.0 


68.0 


57.05 


65.85 


64.5 


64.4 


68.5 


Synthetic C 


5 


20,000 


200 




93.2 


77.0 


55.0 


73.2 


68.0 


73.1 


87.5 


Waveform 


21 


4000 


1000 




89.4 


72.5 


77.34 


78.12 


77.6 


74.3 


78.4 


MNIST (Ovsl) 


784 


12,665 


2115 




99.52 


99.47 


95.06 


99.48 


98.82 


99.34 


99.71 


MNIST (8vs9) 


784 


11,800 


1983 




96.57 


95.9 


69.41 


90.62 


90.32 


84.75 


94.7 


IJCNN 


22 


35,000 


91,701 




91.64 


64.82 


67.35 


88.9 


74.27 


85.32 


87.81 


w3a 


300 


44,837 


4912 




98.29 


89.27 


57.36 


87.28 


96.95 


88.56 


89.06 



Table 1: Single pass classification accuracies of various algorithms (all using linear kernel). The synthetic datasets (A,B,C) were generated 
using normally distributed clusters, and were of about 85% separability. libSVM, used as the absolute benchmark, was run in batch mode (all 
data in memory). StreamSVM Algo-2 used a small lookahead (~10). Note: We make the Pegasos implementation do a single sweep over 
data and have a user chosen block size k for subgradient computations (we used k=l, and k=20 akin to using a lookahead of 20). Perceptron 
and LASVM are also run for a single pass and do not need block sizes to be specified. All results are averaged over 20 runs (w.r.t. random 
orderings of the stream) 



the data-order permutations. Note that the algorithm still per- 
forms a single pass over the data. Figure 3 shows the results 
on the MNIST 8vs9 data (similar results were obtained for 
other datasets but not shown due to space limitations). In this 
figure, we see two effects. Firstly, as the lookahead increase, 
performance goes up. This is to be expected since in the limit, 
as the lookahead approaches the data set size, we will solve 
the exact MEB problem (albeit at a high computational cost). 
The important thing to note here is that even with a small 
lookahead of 10, the performance converges. Secondly, we 
see that the standard deviation of the result decreases as the 
lookahead increases. This shows experimentally that higher 
lookaheads make the algorithm less susceptible to badly or- 
dered data. This is interesting from an empirical perspective, 
given that we can show that in theory, any value of L < N 
cannot improve upon the 3/2-approximation guaranteed for 
L = \. 

6 Analysis, Open Problems, and Extensions 

There are several open problems that this work brings up: 

1. Are the (1 + \/2)/2 lower-bound and the 3/2 upper- 
bound on MEB radius indeed the best achievable in a 
single pass over the data? 

2. Is it possible to use a richer geometric structure instead 
of a ball and come up with streaming variants with prov- 
ably good approximation bounds? 

We discuss these in some more detail here. 

6.1 Improving the Theoretical Bounds 

One might conjecture that storing more information (i.e., 
more points) would give better approximation guarantees in 
the streaming setting. Although the empirical results showed 
that such approaches do result in better classification accura- 
cies, this is not theoretically true in many cases. 

For instance, in the adversarial stream setting, one can 
show that neither the lookahead algorithm nor its more gen- 
eral case (the multiple balls algorithm) improves the bounds 
given by the simple no-lookahead case (Algorithm- 1). In par- 
ticular, one can prove an identical upper- and lower-bound 



for the lookahead algorithm as for the no-lookahead algo- 
rithm. To obtain the 3/2-upper bound result, one can show a 
nearly identical construction as to [Zarrabi-Zadeh and Chan, 
2006] where L — 1 points are packed in a small, carefully 
constructed cloud the boundary of the true MEB. 

Alternatively, one can analyze these algorithms in the ran- 
dom stream setting. Here, the input points are chosen adver- 
sarially, but their order is permuted randomly. The lookahead 
model is not strengthened in this setting either: we can show 
both that the lower bound for no-lookahead algorithms, as 
well as the 3/2-upper bound for the specific no-lookahead al- 
gorithm described, generalize. For the former, see Figure 4. 
We place (N - l)/2 points around (0, 1) and (N - l)/2 
points around (0, —1) and one point at (1 + y/2, 0). The al- 
gorithm will only beat the (1 + \/2)/2 lower bound if the 
singleton appears in the first L points, where L is the looka- 
head used. Assuming the lookahead is polylogarithmic in N 
(which must be true for a streaming algorithm), this means 
that as N — ► oo, the probability of a better bound tends to- 
ward zero. Note, however, that this applies only to the looka- 
head model, not to the more general multiple balls model, 
where it may be possible to obtain a tighter bounds in the ran- 
dom stream setting. 




Figure 4: An adversarially constructed setting. 

6.2 Ellipsoidal Balls 

Instead of using a minimum enclosing ball of points, an al- 
ternative could be to use a minimum volume ellipsoid (MVE) 



[Kumar et al, 2005]. An ellipsoid in R D is defined as fol- 
lows: (x : (x - c)'A(x - c) <= 1} where c e R D , 
A e R , and A >z (positive semi-definite). 

Note that a ball, upon inclusion of a new point, expands 
equally in all dimensions which may be unnecessary. On the 
other hand, an ellipsoid can have several axes and scales of 
variations (modulated by the covariance matrix A). This al- 
lows the ellipsoid to expand only along those directions where 
needed. In addition, such an approach can also be seen along 
the lines of confidence weighted linear classifiers [Dredze et 
al, 2008]. The confidence weighted (CW) method assumes 
a Gaussian distribution over the space of weight vectors and 
updates the mean and covariance parameters upon witnessing 
each incoming example. Just as CW maintains the models 
uncertainty using a Gaussian, an ellipsoid generaization can 
model the uncertainty using the covariance matrix A. Recent 
work has shown that there exist streaming possibilities for 
MVE [Mukhopadhyay and Greene, 2008]. The approxima- 
tion gaurantees, however, are very conservative. It would be 
interesting to come up with improved streaming algorithms 
for the MVE case and adapt them for classification settings. 

7 Conclusion 

Within the streaming framework for learning, we have pre- 
sented an efficient, single-pass ^-SVM learning algorithm 
using a streaming algorithm for the minimum enclosing ball 
problem. We have also extended this algorithm to use a 
lookahead to increase robustness against poorly ordered data. 
Our algorithm, StreamSVM, satisfies a proven theoretical 
bound: it provides a (|) -approximation to the optimal solu- 
tion. Despite this conservative bound, our algorithm is exper- 
imentally competitive with alternative techniques in terms of 
accuracy, and learns much simpler solutions. We believe that 
a careful study of stream-based learning would lead to high 
quality scalable solutions for other classification problems, 
possibly with alternative losses and with tighter approxima- 
tion bounds. 
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